??? (TODO), 2019

Today's schedule

TODO: nachträglich anpassen / nummerierung!

  1. Introduction
  2. A short introduction to plotting with ggplot2
  3. Hello world: Plotting points on a world map
  4. Sources and file formats for geographic data

– lunch break –

  1. A short introduction to data linkage with dplyr
  2. Combining data and making a choropleth map
  3. Geo-coding and reverse geo-coding via Google Maps API
  4. Coordinate reference systems and projections

Introduction

About me

TODO

Aims of today's workshop

TODO

full flexibility

but even things that look simple are actually quite complex

A short introduction to plotting with ggplot2

Plotting with ggplot2

There are three basic steps for constructing plots with ggplot2:

  1. Supply a data set you want to plot to ggplot().
  2. Define an aesthetic mapping with aes().
    This describes how variables of your data are mapped to visual properties, e.g. variable "age" is plotted on the x-axis and "income" on the y-axis.
  3. Add layers of geoms (geometrical objects) that describe which graphical primitives to use (e.g. points in a scatter plot or bars in a bar plot).

Additionally, you can further change the appearance of your plot by:

  • altering the scales (e.g. use a logarithmic scale, modify display of factors, etc.)
  • defining facets → create small multiples, each showing a different subset of the data
  • changing the overall appearance of the plot by adjusting its theme (e.g. change background color, rotate axis labels, etc.)

You combine all these steps with a +.

General concepts behind ggplot2

ggplot2 components - 1

General concepts behind ggplot2

ggplot2 components - 2

General concepts behind ggplot2

ggplot2 components - 3

General concepts behind ggplot2

ggplot2 components - 4

General concepts behind ggplot2

ggplot2 components - 5

General concepts behind ggplot2

ggplot2 components - 6

General concepts behind ggplot2

ggplot2 components - 7

Hello world: Plotting points on a world map

Coordinate reference systems in brief

We usually work with two-dimensional vector data: Points located within a coordinate reference system (CRS).

  • the point -73.94°, 40.6° is represented in the WGS84 CRS: most popular world coord. system (e.g. used by GPS)
  • this CRS is ellipsoidal; its components are given in degrees:
    • -73.94° is the longitude: about 74° West from the meridian
    • 40.6° is the latitude: about 41° North from the equator

Coordinate reference systems in brief

Let's put this point into context:

  • points can be connected to form other geometrical shapes such as lines or polygons
  • points and shapes can represent numerous geographic entities: cities, buildings, countries, rivers, lakes, etc.

Packages, packages, packages

We need to extend R in order to work with geographic data (geo-data) in R by installing these packages:

  • maps: contains geo-data for national borders and administrative regions for several countries
  • rgeos and maptools: Misc. functions for operations on geometries
  • sf: "Simple Features for R" – reading, writing and transforming spatial data

Install them if you haven't yet:

install.packages(c('maps', 'maptools', 'sf', 'rgeos'))

Simple features

Most packages for working with geo-data in R rely on the Simple Features standard (→ sf package).

Simple features describe how objects in the real world can be represented in computers in terms of their spatial geometry:

Features have a geometry describing where on Earth the feature is located, and they have attributes, which describe other properties.
sf package vignette

Examples:

  • geometry: point at -73.94°, 40.6°
  • name: New York City
  • population: 8.6 mio.
  • geometry: polygons with of points at …
  • name: Italy
  • most popular meal: pizza

A spatial dataset is a dataset that contains geo-data (in a geometry column) and attributes of that entity like population, poverty rate, etc.

Making a world map

First, we load the packages that we need:

library(ggplot2)
library(maps)
library(sf)

Making a world map

The function map can be used to load the "world" data. We need to convert it to a "simple features" object via st_as_sf():

worldmap_data <- st_as_sf(map('world', plot = FALSE, fill = TRUE))
head(worldmap_data, n = 3)  # to have a look inside
## Simple feature collection with 3 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -8.68335 ymin: 18.98662 xmax: 74.89131 ymax: 42.64795
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##                         geometry          ID
## 1 MULTIPOLYGON (((74.89131 37... Afghanistan
## 2 MULTIPOLYGON (((20.06396 42...     Albania
## 3 MULTIPOLYGON (((8.576563 36...     Algeria
  • spatial dataset with a "header" giving some general information
  • two columns: geometry (spatial data) and ID (attribute)

What do the numbers in the geometry column mean? What's their CRS?

Making a world map

We are ready to plot the world map. Every "simple features" object can be plotted by adding a geom_sf layer:

ggplot() + geom_sf(data = worldmap_data)

Putting points on the map

We have some cities along with their longitude (lng) and latitude (lat):

##       name    lng    lat
## 1   Berlin  13.38  52.52
## 2 New York -73.94  40.60
## 3   Sydney 151.21 -33.87

We add a "point geom" layer to our map:

ggplot(some_cities) +                                # pass the cities data to plot
    geom_sf(data = worldmap_data) +                  # layer 1: world countries
    geom_point(aes(x = lng, y = lat), color = 'red') # layer 2: points at city coord.

Adding labels next to the points

You may also add text labels for the cities. ggplot2 provides geom_text and geom_label (draws box around text):

ggplot(some_cities) +                                  # pass the cities data to plot
    geom_sf(data = worldmap_data) +                    # layer 1: world countries
    geom_point(aes(x = lng, y = lat), color = 'red') + # layer 2: points
    geom_label(aes(x = lng, y = lat, label = name),    # layer 3: labels
               hjust = 0, vjust = 1, nudge_x = 3) +    # labels appear next to point
    theme(axis.title = element_blank(),
          axis.text = element_blank())                 # disable the axis labels

Excercise 1

TODO: Anpassen / 2 mgl. Übungen?

Visualizing WZB partner institutions on a world map

Make a script that:

  1. Load the CSV file wzb_partners_data.csv
  2. Load the data for the world map
  3. Remove the shape (i.e. the "feature") for Antarctica from the world map data
  4. Make a plot, showing the partner institutions with a dot on the world map (optional: make the color dependent on the "type" column)
  5. Optional: Group institutions by city and count institutions per city; show the city locations in Europe only as a dot, make the dot size dependent on number of institutions

See further hints on handout.

Sources and file formats for geo-data

Administrative levels and identifiers

Suppose you have observations that refer to geographic locations, e.g. countries, cities, neighborhoods. You may want to:

  • link that data with existing information on that geographic entity e.g. country GDP, city population, poverty rate in the neighboorhood
  • visualize that data geographically

For both things you need to:

  1. agree on geographic (administrative) level for matching (e.g. country, state, municipality)
  2. find identifiers to match the observations w/ respective geographic entities

TODO: schematische darstellung

Country names, country codes

  • different names may refer to the same country (e.g. Germany / Federal Republic of Germany, BRD, etc.) → often not a good identifier
  • ISO 3166-1 designates every country a two- or three-letter code (e.g. DE / DEU)
  • often used in datasets (e.g. from the UN)

Finding NUTS in the EU

The Nomenclature of Territorial Units for Statistics (NUTS) divides the EU territory into regions at 3 different levels for socio-economic analyses of the regions.

Candidate countries and countries inside EFTA also have NUTS regions (Norway, Switzerland, Albania, Turkey, etc.).

NUTS levels
source: Eurostat

Finding NUTS in the EU

AGS in Germany

In Germany, regions are hierarchically structured and identified by "Amtlicher Gemeindeschlüssel" (AGS ~ "municipality identificator"). The Gemeindeverzeichnis contains all regions with their name, AGS, area, population and other features.

Berlin: LOR codes

Since 2006, Berlin uses "Lebensweltlich orientierte Räume" (LOR) to structure the city area into sub-regions at different levels, each of them identified by "LOR" code:

Level 1: Prognoseräume (60 regions); Level 2: Bezirksregionen (138 regions); Level 3: Planungsräume (447 regions)

What if I have no geo-identifier?

If your observations don't contain the identifiers you need, you can still do the following as long as you have some information suitable for geo-coding (e.g. address, city name, institution name to convert to a geo-coordinate):

  1. find out the longitude/latitude coordinate for each observation
  2. match each coordinate to its surrounding geographic entity (i.e. a municipality) by doing a "point-within-polygon" test

Sources for geo-data

Have look in the handout document, where geo-data sources at world- and EU-level (NUTS) as well as sources for Germany (AGS) and Berlin (LOR) are presented.

File formats

There are numerous file formats that store data for geographic information system (GIS) software. The most popular include:

  • ESRI shapefile ("SHP"):
    • consists of at least three files in the same directory with filename extensions .shp, .shx and .dbf
  • GeoJSON (.geojson, .json) and TopoJSON (.topojson, .json)
  • KML (.kml, .kmz): Keyhole Markup Language, used by Google Earth
  • GML (.gml): Geography Markup Language
  • AutoCAD DXF (.dxf)

You may also encounter other formats such as vector data files (.svg, .poly) or databases (.sql, .sqlite).

Good news: sf can handle all popular file formats.

Exploring geo-data files with QGIS

QGIS is a free, open-source GIS software running on all major operating systems.

  • useful for exploring downloaded geo-data files
  • I will show how to load a geo-data file and investigate properties of individual shapes
  • if you've installed QGIS, you can follow along

QGIS screenshot

Lunch break